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1 LIMIX methods and implementation 



LIMIX is a flexible mixed model framework, allowing to efficiently formulate a variety of different mixed 
model analyses. Complex covariance functions to define random effects can be combined from simpler 



building blocks, using addition and multiplication as basic operations (Section 1.1). Inference in these 



models can be performed by gradient-based optimisation of the marginal likelihood or a regularized ver- 



sion thereof (Section|1.2[). Section 1.4 describes how the identical framework can be extended to perform 



efficient inference in the special case of matrix variate data and Section 1.5 outlines the software imple- 
mentation and exemplifies the basic usage. 

In the most general formulation, we assume that the data y is multivariate normal distributed with 
mean \i and covariance matrix S 

y~N(» e ; Eg), (1.1) 

where 6 is the set of model parameters. In linear mixed models \iq typically is a linear function \iq = Xf3, 
parametrized by a fixed design matrix X and fixed-effect parameters (3. The covariance matrix Sq is 
parametrized by individual variance components or covariance functions, corresponding to random effects 
in the model. In this instance, the set of model parameters 6 is given by the union of the fixed effects (3 
and the variance parameters. 

1.1 Kernel methods for defining composite covariance functions 

Defining property of a valid covariance matrix is a positive semi-definite symmetric matrix. In LIMIX 
we utilize a key insight from the kernel machines literature that allows to construct complex covariance 
matrices by combining a set of basic covariance models. First, we define a set of base covariance functions 



used in LIMIX (Section 1.1.1). In Section 1.1.2, we then define basic operations to combine two base 
covariance matrices A and B using sum and multiplication operators, which obey positive semi-definiteness 
of the resulting matrix C. By iteratively applying these operations, it is possible to build complex 
covariance matrices, thereby allowing for extensive modeling flexibility. 

1.1.1 Base covariance models 

LIMIX provides implementations for a range of covariance matrices, where the most important examples 
are defined here. A more rigorous treatment of covariance functions including the definition of more 
extensive examples can be found in [T]. Each covariance is parameterised by a set of parameters 9, giving 
rise to a semi-positive definite covariance matrix C{9). For inference, the matrix derivatives of C w.r.t Oi 



are needed, see also Section 1.2 



Fixed covariance matrix 



parameter oi > 0: 



If the covariance A between the samples is observed, inference requires to merely learn a single amplitude 

fin 

C(0) = a 2 g A, %jj=A (1.2) 



Here, 6 = [a 2 ]. 
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1 LIMIX methods and implementation 
Linear covariance function 

The linear covariance function can be derived by marginalizing out the weights in a random effect model 
X(3, assuming /3 ~ A/"(0, cr?J). This results in a covariance of the form C(8) = a^XX T , where again 
6 = [(Jg}. In genetics, this covariance can be used to model polygenic effects if X corresponds to genome- 
wide SNP data, e.g. [21 [3]. 

Free form covariance model 

In practice, the covariance between phenotypes is not observed and hence its parameters need to be 
inferred from data. The most general form is the free form covariance, which is a general P x P matrix 
that obeys the semi positive-definiteness constraints. To this end, we parametrize the covariance using 
cholesky factors, as any semi-positive definite matrix has a valid cholesky decomposition. 



C(0) = LL T , ^ = L8(L) T + 8(L)L T (1.3) 

where L is a lower triangular matrix of covariance parameters, i.e. 0 = [Loo, L\q, Lh, . . . , Lq u , L nn ]. The 
gradient with respect to a matrix field Ljj can be obtained by applying the chain rule -j^- = §37 ^Z^ 7- 

Low-rank covariance model 

We can restrict the phenotypes to lie in a lower-dimensional linear subspace, by using a low-rank param- 
eterization of the covariance matrix 

C(0) = ZZ T , g^ = 2Z (1.4) 

where Z is a P x K matrix, with K < P, and the parameters are its entries, i.e. 6 = [Zqq, Zqi, . . . , Zqi~, ■ ■ ■ , 
Z n Q, . . . , Z n k\. The gradient with respect to an entry Zij can then again be obtained by using the chain 
ri] i P 9C_ _ dCdz_ 

Diagonal covariance matrix 

The diagonal covariance matrix is often used to model the noise between the phenotypes. By restricting the 
matrix to be diagonal, we implicitly assume that the noise is independently, but not identically distributed 

BC 

C(9) = diag(d), Qj = I, (1-5) 

where 6 = [d± , . . . , d n ] . 

1.1.2 Composite covariance functions 

First, the sum of two covariance matrices is a covariance matrix 

A + B = C. (1.6) 

Additive combinations of covariance matrices allow for variance decomposition and can be interpreted as 
multiple independent random effects. Second, the point-wise product of two covariance matrices is a valid 
covariance matrix, 

AQB = C (1.7) 



and the Kronecker product of two covariances matrices is as well (see also Section 1.4) 



A® B = C. (1.8) 
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1.2 Inference and parameter estimation 



The gradient of the composite covariance functions can be obtained by applying iteratively the sum and 
product rule 

§- e ^B) = §- e A + §- e B (1.9) 

|(A0B) = ^(A) 0B + A0^(B) (1.10) 

§e {A ® B) = 6% {A) ® B + A ® §e {B) ■ (L11) 
1.2 Inference and parameter estimation 
1.2.1 Maximizing the model likelihood 

We perform parameter estimation by maximizing the log likelihood. The gradient of the log likelihood is 
optimized using LBFGS |4J. 



1 9 (y - ne 



y-M'^ 1 qo ' (1-12) 



Repeat optimizations from independent starting points and informative initializations are employed to 
improve convergence of the optimization procedure. 

1.2.2 Regularization 

Estimating the covariance matrix between pheno types is hindered by the large parameter space: for 
free- form covariance matrices, the number of parameters is growing quadratically with the number of 
phenotypes, while the number of new data points is only growing linearly. If not accounted for, this will 
lead to overfitting, i.e. the model does not only explain the signal of the data but also the noise, which in 
turn leads to a bad generalization behavior and a loss of interpretability. A natural way to prevent this is 
to add a regularization term over the covariance matrix 

mm - logAf ( y \ jig ; Eg ) + Reg(S e ) . (1.13) 

data fit regularizer 

While the first term maximizes the fit between the data and the model, the second term acts as a 
penalizer that prevents the model from becoming too complex. LIMIX makes available a sum-of-squares 
penalty on the off-diagonal entries of the covariance matrix Reg(5]e) = A]T\_^ (S^)^- 2 , or on its inverse 

Reg(Xe) = A Ylijtj (^0 X ) ' wnere A is the trade-off parameter between fitting the data and regularizing. 

In multi-trait models, in particular for larger numbers of traits (eQTL analysis in yeast) we considered 
the latter. Both this penalizations do not penalize the diagonal elements, to retain an unbiased estimate 
of the marginal heritabilities. From the Bayesian perspective, the regularization term is equivalent to an 
isotropic Gaussian prior, with zero mean. 

Out of sample prediction In practice, it is often hard to find the right trade-off between model fit and 
regularization. We used out of sample prediction accuracy to find the appropriate A. Predictions for new 
data points can be carried out by conditioning on the observed data 



p(y*\y, 6) =M (s/> 0 (X*) + S 0 (X*,X)S 0 1 (y - Me ); 

V e (X*,X*) - E 0 (X*,X)Eg 1 E 0 (X ! X*)) , (1.14) 

where fj,g(X*) is the mean function on the test inputs, Jjg(X,X)* depicts the covariance between the 
training and the test samples, and 'Sg(X* , X*) between the test samples. 
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1 LIMIX methods and implementation 



1.3 Kronecker product identities 

Let A be a N x M matrix, and B be a P x Q matrix. The Kronecker product A ® B is a iVP x MQ 
matrix and denned as follows 

/ A U B ... A ln B\ 
A®B = : : (1.15) 

\A m iB . . . A mn B J 

Due to its block structure, the Kronecker product has a number of nice properties that speed up 
inference [5l |6]. In particular, the dot product between two Kronecker products is a Kronecker product 
again 



( A ® B) (C ® D) = AC ® BD, (1.16) 

where C G R MxR and D G M Qx5 . It can be evaluated in 0(N MR + PQS) time, while the naive runtime 
is 0(NPMQRS). 

Let vec be an operation that concatenates the columns of a N x M matrix into a vector of length N ■ M. 
The product of Kronecker product and a vectorized matrix can be computed efficiently 

(A®B)vec(Y)=vec(BYA T ) (1.17) 

bringing the runtime down to 0(mm(PQM + PMN, QMN + PQN)) from O(NPMQ). 

Let UaSaU\ be the eigenvalue decomposition of A £ M. NxN and UbSbU^ the eigenvalue decompo- 
sition of B G M. PxP . The eigenvalue decomposition of the Kronecker product A®B can be synthesized 
by the eigenvalue decompositions of its components 

A® B = (Ua <8> £/ B ) (S A 8) 5 B ) (*7a T ® t/ B T ) (1.18) 

leading to a runtime reduction from 0(N s P 3 ) to 0(N 3 + P 3 ). 



1.4 Efficient inference for matrix variate mixed models 

In Section [l.l[ we gave a general overview over existing covariance functions. We now consider the special 
case in which the covariance matrix can be written as a sum of two Kronecker products and the fixed 
design matrix also exhibits Kronecker structure 




(1.19) 



Here, Y denotes the N x P data matrix with N samples and P phenotypes. For the fixed effect j, the 
matrix Bj G M MjXDj denotes the fixed-effects, Aj G M. PxM i is the design matrix of the column effects 
and Xj G M. NxD i of the row effects. We define R(Oji) as the signal row covariance of the data matrix and 
C{9c) as the signal column covariance matrix. Similarly, the noise column covariance matrix is given by 
E(#x). For the sake of clarity, we further assume that the noise row covariance matrix is the identity /, 
the extension to an arbitrary noise row covariance matrix is straightforward. For notational convenience, 
we will also drop the dependence of the covariance matrices on additional hyperparameters from now on. 

The merits of this class of models lie in its tractability: as we show in the following, efficient inference 
can be performed in 0(N 3 + P 3 ) time and in 0(N 2 + P 2 ) space, whereas an arbitrary covariance matrix 
of size NP has runtime 0(N 3 P 3 ) and a memory requirement of 0(N 2 P 2 ). 
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1.4 Efficient inference for matrix variate mixed models 



1.4.1 Log likelihood evaluation 

The log likelihood of the data is given by: 

log£ = -^log(2vr) - i log|C ® it + E® I\ - ^vec(Y r ) T (C ® R + E ® I)' 1 vec(Y r ), 



(1.20) 



where Y r is the residual phenotype, after the fixed effects have been subtracted from the data: 

J 

vec(l^) = vec(Y) - ^ (Aj ® Xj) vec(Bj) 
i=i 

veci Y-^XjBjA] 

3=1 



(1.21) 



Let U-£,SsU-£ T be the eigenvalue decomposition of E. As shown in 0 [3], we can whiten the covariance 
matrix in a two-step procedure. First, we whiten the noise by factoring it out. Second, we exploit the fact 
that the eigenvalue decomposition of a Kronecker product is compatible with a constant diagonal matrix 
to whiten the remaining covariance matrix: 



K = C <g> A + E&I 



C^R + U-sSsU^®! 



c 



fL18| 



£/ s S S 2 (gill IUcScU^UrSrU^ + I®!) {S^u^ 1 <g> I 



(1.22) 



where U^S^U^, is the eigenvalue decomposition of C. The log determinant can then be written as 



log |C ® .R+ E ® I| |AB| i AMB| log\S x ®I\+log\S 6 ®S R + I®I\ (1.23) 



\A®B\=\A\ F -\B\' 



P P N 

Nj2\ 0 gSy } \p,p]+^2^2log(S e \p,p]S R [n,n] + ^) (1-24) 

p=l p=l n=l 



We can evaluate the squared form efficiently as follows 
vec(i;) T (C ® + S ® J)- 1 vec(Y r ) 



JTT22] | 



vec(i;) T [(E/ s S s i£/ e 0 (5 a ® S R + J ® I) {u^S^U^ ® ITj)] vec(F r ) 



(UvS^Uc^Ur) \ec(Y r ) 



U < kSv-2U±®U R )vec(Y r 



(S^® S R + I ® I) 



vec 



vec 



vec 



U R Y r UvSv-2Uc 
U^Y r UxSjs~^U 6 
where D is a N x P matrix defined by the entries 

D[n,p] 



SU ® S 1 ^ + J ® I") vec 



U^YrU-sSv-zUc 



D®URY r UvSv-*U e 



1 



Sg\p,p] ® 5fi[n,n] + 1' 



(1.25) 



(1.26) 
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1 LIMIX methods and implementation 

The inverse of the covariance term times a vector is therewith: 



(C ® R + £ ® I)- 1 vec(Y r ) = (iTsSe-sL^ ® J7 R 



vcc 



DQU^YrU-sS-s-iUc 



[1.27) 



1.4.2 Estimation of the covariance parameters 

We want to evaluate the gradient with respect to a particular covariance parameter 6 G {Oc,Br,0y:} ■ 
The derivative consists of the derivative of the determinant term and the derivative of the squared form. 
Each of these is given separately for the row covariance parameter 9 r G Or. The gradients for the column 
covariance parameters can be derived analogously and are omitted for brevity. 

We start with the gradient of the log determinant: 



d_ 
W r 



log | C (8) -R + E® I\ 



\i:[(C R+E iy l [C®^-R 



tr 



UvS^Uc ®u R )(s d ®s R + i®i) [ms^Uv 1 ® u R 



(AB)- 1 = B- 1 A~ 1 
tr(AB)=tr(BA) 
\1.16) 
def(C) 
tr(AB)=vec(A) T vec(B) 



tr [ms^Uv 1 ®u R 



-1 



tr ( (Sc^Sr + I® I) (UnS&UeQUn) c« 



S R + I®I) (UvSv2Ur®U R ) [c 



_d_ 



d_ 

Mr 



R 



R) [u^s^u^ ®ul 



I r ( ( S c S R -I- I I) ( U c S»-W x ' CITv S s - - U c Ur [ J- . 



R)Ur 



tr l(S 6 ® 5 R + I ® I) ( U ] d CU d ® U R 
diag (Sg ® 5k + J ® I) 



R)Ur 



diag (%) ® diag UrT [—R)Ur 



d_ 

80' 



(1.28) 



The gradient of the squared form is given by: 



89, 



-vec(i;) T (C ® R + £ ® J)" 1 vec(i;; 
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-vec(r r ) 1 (C ® R + £ ® J)" 1 ( C ® ^RJ (C ® R + £ 0 iy 1 vec(Y r ) 



vec(r r ) (Lr s S s 5?7 (5 ®i7 K ) ( C ® — R ) (U£.Sv*U± ® U R ) vec ( Y r 



(1.16) 



dcf(C) 



-vec ( "K 



UlSjT^UlCUvS^Uc ®Ul(^-R) Ur 



vec ( 1^ 



-vec ( 



vec ( "K 



—vec r r vec 



(1.29) 
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1.4 Efficient inference for matrix variate mixed models 



1.4.3 Estimation of the fixed effects 

Gradient Evaluation We first evaluate the gradient with respect to a single fixed effect 

d 

[dB k ] a>i 

-vec(Y r ) T (C®fl + S® J^vec ( ) 

\[0-£>k\a,bJ 



^vec(Y r ) T (C®fi + S® iy 1 vec(Y r )^ 



i . 



-vec (Y r y vec (u£ {[X k ] tta [A k ]J b ^ U^S^U 6 

tr(AB)=vec(A) T vec(B) / ~ T T r. ]Tn c i,, 

tr(ABC)=tr(CAB) 



-ix [[Akl^UjtS^UcYr U R [X k ] ha 

tr(A)=tr(A T ) _[ Xfc] T ^^T^-i^T^^ & (L3Q) 

When stacking together the derivatives for all a and 6, the gradient with respect to all entries of B k 
follows as 



^~vec(Y r ) T (C & R+H (£> iy 1 vec(Y r )\ 



xJUrYtUJ.S^Uj: 1 A k (1.31) 



Closed form maximum likelihood estimates First, we rewrite the mean function by concatenating the 
Kronecker products of the design matrix 

& = [A 1 ®X 1 ,...,Aj®Xj] (1.32) 

and concatenating the fixed effects 

'Vec(.Bi) 

(1.33) 

where $ is a NP x DM matrix and (3 is a DM x 1 vector, with D = Dj and M = Mj- It is easy 
to show that ^2j=i (Aj ® Xj) vec(Bj) = */3. 

By setting the gradient of the log-likelihood with respect to the weight vector to zero, we can then 
obtain a closed form solution of the maximum- likelihood estimate (3 m ■ 




d 



i (vec (Y) - $/3) T (C ® R + £ ® I) - 1 (vec (Y) - $/3)^ = 0 (1.34) 



d(3 M 

$ T (C ® + £ ® iy 1 $>(3 M -$ T (C®fi+S® I) -1 vec (Y) = 0 (1.35) 
$ T (C ® i? + £ ® J)" 1 $/3 = $ T (C ® -R + S 0 I) -1 v ec (V) (1.36) 
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1 LIMIX methods and implementation 



(3 M = ( $ T (C 0 R + £ 0 J)" 1 $ ) $ T (C ® .R + £ ® J)" 1 vec (Y) 



loft term 



right term 



(1.37) 



We first compute the right term with respect to the j th block. 



(Aj 0 Xj) T (C 0 + £ 0 J) -1 vec (F) 

(A,- 0 Xj) T (UvSjT*U e 0 Up) vec (D 0 E/j^E/eSsT^c 



1h3 



vec 



(1.38) 



By concatenating the blocks, we obtain 



$ T (C 0 # + <t 2 J) 1 vec (Y") 



vec [XfU R [DQU^YrU-sS^-Wc) UlS^'W^Ax 



Yec{X 1 J U R [DQU B Y r U i: S^—2U d )UlS^—2U^Aj 



;i.39) 



For the left term, we need to invert the block matrix $ T (C 0 + £ 0 J) 1 $ G l flMxDM . We start 
with computing the (i,j) th block involving the i th and the j th fixed effects: 



|T722| 



fTi6) 



Aj 0 Xj) 



(A, 0 X) T (C 0 -R + £ <S> iy 1 (A 3 0 Xj) 

(Aj 0 X,) T [(c/sSsit/^ 0 Ujij (S 6 0 5 R + I 0 I) ([/T5 E i?7 s T 0 U T R 
(A, 0 X) T ((7 S 5 S -5J7^ ® £/ fl ) (S a 0 5 fl + / 0 I) -1 (c/T5 s -5LT s T 0 t/j) (A,- 0 X 
ATc/sSs-lt/^ 0 X7C7 R ) (5 e 0 S R + I 0 (u^S^-^U^Aj 0 C/^X^ 



£ ( [aJu s s^u ( 

c=l ^ 

p ✓ 

Y,([aJu s s^u, 

c=l ^ 

p / 

£ I [ATj7 s 5 s -^t/ ( 



xTt/ H ) (5 6 [ C! c]5 H + J) 1 



U^S^'U^ Aj 



U r Xj 



xJu r 



U^S^-^U^ 1 Aj 



E/ e S s -2l/ s 1 A 3 



2>(S e [c,c]S R + l) 1 UlX j 
xJUr {S d [c, c]S R + I)' 1 Ur~X 3 ) (1.40) 



Evaluating the term takes 0(PM 2 + PD 2 N) time. If the number of samples is smaller than the number 
of traits (N < P), we can also explicitly sum over the samples leading to a runtime of 0(ND 2 + NM 2 P). 
Inverting the left term has a runtime complexity of <3(.D 3 M 3 ). 
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1.5 Implementation details 



1.4.4 Predictions 



The mean predictor can be evaluated efficiently as follows 



J 



vec(M*) = i A i ® X j ) vec ( B j) + (C«fl*)(C®J?+S« iy 1 vec 7-^ (Aj ® X,-) vec(-Bj) 

J 

delg r ) ^ XjBjAj + (C ® H*) (C ® + £ ® I)" 1 vec(Y r ) 



5=1 

1.5 Implementation details 

OOP design LIMIX employs on object-oriented design that allows for flexibly combining different co- 
variance functions using operations. These covariance models can be used for parameter-inference, both 
using gradient-based optimization and using closed-form optimization of fixed effects. The core machinery 
is implemented in C++ and transparent interfaces to python permit to allow construction complex genetic 
models at ease and without having expert knowledge. 





(1.41) 
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2 Multivariate linear mixed models for statistical 
genetics 



2.1 Introduction 

In the most general formulation LIMIX models an N x P matrix Y of N samples for each of P phenotypes 
by a multivariate linear mixed model with J fixed effects {Fj} and / random effects {TJ{\ 

J I 

j=l i=l 

Each of the fixed effects and random effects is Kronecker structured. In particular, any fixed effect Fi 
can be written as (Aj ® X{) vec(-Bj) where A, G ~R PxM is the design matrix of the phenotypic effects, 
Xi G M. NxD is the design matrix of the sample-specific effects, and Bi G ~M. MxD is the matrix of the effect 
sizes; while any random effect Ui is matrix- variate normal distributed, U ~ Nnm ( 0 ; Ri , Cj), with row 
covariance matrix Ri G IR ArxAr and column covariance matrix Cj G IR- Px - P . i?j and Cj can be interpreted 
as sample-to-sample and trait-to-trait relatedness matrices due to contribution i respectively. 



All the models described by equation (2.1 ) can be dynamically built using LIMIX and subsequently fitted 



to the data using the optimization framework we introduced in Section 1. In the following subsection we 



show how some commonly used mixed models for genetic studies can be written as in (2.1). Afterwards 



we review common types of genetic studies that are supported by LIMIX. In Section 2.2 we present the 
framework made available by LIMIX to dissect the phenotypic variability across different sources and make 
out-of-sample phenotype predictions. As discussed in the main text, generalization performance measured 
by out-of-sample prediction can be used as a model selection criterion, given the wealth of models made 



available by LIMIX. In Section 2.3 we describe models for genome-wide association studies (GWAS) and 
show how they can be extended to build a multi-locus multivariate model. Finally, in Section |2.4| we 
discuss the PANAMA module in LIMIX, which can be used to infer non-observed (hidden) covariates so 
that they can be accounted for in GWAS and variance decomposition. 

2.1.1 Some recently proposed MTMM 



A particular, yet multivariate case of this general model is when the same trait design is used for all fixed 
effects and only two random effects describing the genetic and non-genetic contributions to the phenotype 
are considered: 

Y = (A®X)vec(B) + U + V, U ~ M NM ( 0 ; R, C) , * ~ N NM ( 0 ; I , S) (2.2) 

where R is the kinship matrix, which describes the genetic relatedness between the samples, C describes 
the genetic relatedness between the phenotypes, and 5] the relatedness between the phenotypes due to 
shared non-gentic effects, i.e. noise. Similar models have been recently employed in genetic analysis of 
multiple phenotypes 0 El O [7] . 



Considering the univariate version of the model in (2.2), the trait design matrix in the fixed effect 
collapses to 1 while the trait-to-trait covariance matrices reduce to single variance components. Indicating 
with y, (3, u, tp, a 2 g and o\ the univariate versions of Y, B, U, C and S the model reduces to 

y = X/3 + u + il>, u ~ N (0, a 2 g R) , * ~ N (0, <r 2 e l) (2.3) 



11 



2 Multivariate linear mixed models for statistical genetics 



Very similar models have been widely used in genetic studies for heritability estimation and GWAS [10, 



Instead of considering the effect from single variants as in (2.3), one can consider the effects from all 
SNPs in a region (e.g., a gene or a chromosome). This effect, r, can be modelled as random effect whose 
covariance matrix S is a region-based genetic relatedness matrix: 

y = r + w + </>, r ~ Af(0, orlS) ,u ~ M (0, a 2 g R) , * ~ Af(0, <rfl) (2.4) 

Similar ideas and models have been employed for general set tests |11| . rare variant testing [13], and 
heritability estimation |14j . 

2.2 Variance Decomposition 

In this section, we describe the variance decomposition framework made available by LIMIX that al- 



lows for dissecting the phenotypic variability across different sources for univariate (subsection 2.2.1) and 



multivariate analysis (subsection 2.2.2). In the same framework, LIMIX makes also available a tool for 



out-of-sample predictions, which we discuss in subsection 2.2.3 



2.2.1 Single-trait variance decomposition 

In the model for univariate variance decomposition, fixed effects reflect covariates while random effects 
contributions from different sets of variants (e.g., different chromosomes or local and distal genetic con- 
tributions). The covariance matrices of these contributions are empirical covariance matrices describing 
genetic relatedness based on the different sets. The approach can be generalized to include non-genetic 
random effects whose covariance matrices are known a priori, allowing for correction of covariates and 
joint estimation of genetic and non-genetic contributions to the phenotypic variability. The model can be 
written as: 



y 



X(3 + Y J u l + ^, u;~A/"(0, afRi) , * ~ AA (0, <r 2 e l) (2.5) 



where Ri is the covariance matrix for contribution i. If R is a genetic contribution from set of SNPs S, 
with a bit of abuse of notation we can write 

R=^W h sW hS T (2.6) 

where C = ^tr (W ) 5 i Wl j 5 i T ). The parameters of the model are the fixed effects (3 and the variance 
components af and a 2 . Variance explained by the fixed effect d can be retrieved by considering var {X.^d)- 
Since one is usually interested in the relative fractions of a particular component to the total variance, 
variance components are normalized to sum up to 1. 

2.2.2 Multiple-trait variance decomposition 



Using the notation introduced in Section |2.1[ the general model for multivariate variance decomposition 
can be written as 

Y = Y J (Ai®X t )vec(B i ) + Y,Ui + y, U t ~Af NM (0; R t , Q(a,)) > *~AW(0; I,S(a s )) 

i i 

(2.7) 

where Ui indicates the genetic effect from random effect i with trait-to-trait covariance matrix C« and 



sample-to-sample covariance matrix Ri introduced in (2.6), \l/ denotes the non-genetic contribution. We 
have indicated with ctj, the parameters of the trait-to-trait covariance matrices, which are estimated 
from the data. The parameters of the model are again the weight matrix of fixed effects B and the 
variance components a = {cti, as}. Although LIMIX handles incomplete designs and cases where 
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2.3 Genome-wide Association Studies 



more than 2 random effect terms are involved, in the case of 2 random effect terms and without missing 
values in the phenotype matrix Y LIMIX exploits the Kronecker structure in the model and implements 
the mathematical tricks discussed in the previous chapter to conduct fast estimation of parameters. 

In the most general model the trait-to-trait covariance matrices are general semi-definite positive ma- 
trices (freeform matrices) parameterized by \P(P + 1) real values. However, such a general form may 
lead to overfitting as the number of parameters increases quadratically in the number of traits while the 
data only linearly. LIMIX circumvents this problem in two ways: 

• it makes available to the user a set of different covariance matrices to perform optimization of the 
P x P covariance matrices; 

• it allows the user to introduce a regularization on the covariance matrices 

For a technical discussion about the different covariance matrices and possible regularization schemes made 
available by LIMIX we refer the user to the previous chapter, while here we discuss utility and interpre- 
tation of these. First of all, we highlight that using a diagonal matrix as trait-to-trait covariance matrix 
for random effect i is equivalent to assume that term i does not contribute to trait-to-trait correlations. 
A sum of a rank R matrix and spherical residual C(0) = AA T + a 2 1 (lowrank-id) , where A S M P ' R and 
6 = {vec (A) ,cr 2 }, might help address overfitting problems while modelling trait-to-trait correlations as 
the number of parameters scales linearly with the number of traits. On the other hand, this parametri- 
sation offers limited flexibility for the diagonal elements especially for low R, yielding biased heritability 
estimates. A more flexible form is the sum of a low-rank term and a trait-specific independent contribution 
C(6) = AA T + diag(c) (lowrank-diag), where 9 = {vec (A) , c} and c 6 M. p . The penalization on the 
off-diagonal entries of the trait-trait covariance matrix, or its inverse, bridges a fully independent model 
(diagonal matrix) to the unpenalized model. To select the extent of the penalization, one can use cross 



validation exploiting the phenotype prediction tools discussed in the next subsection (2.2.3). 

Finally, to dissect the contribution of a set of SNPs in shared and trait-specific effects, one can consider 
as trait-to-trait covariance matrix the sum of a block matrix, which describes pure shared effects, and a 
diagonal matrix, which describes pure trait-specific effects, C = a 2 lpp + diag(c 2 ). This covariance forms 
has been employed for the GxE variance decomposition in yeast. 

2.2.3 Phenotype Predictions 

The LIMIX variance decomposition model also implements a tool for phenotype predictions, which can 
be used to monitor overfitting and perform model selection. For example, this can help to select the set 
of fixed and random effects to consider, the best parametrisation of the trait-to-trait covariance matrices, 
or the best extent of the matrix penalization. 



Following (1.14), the predictive posterior mean of the contribution all terms in model (2.7) is 



vec(M*) = ( A i ® Xf) vec(H0 + (Q ® R*) K^vec I Y - (A» ® X { ) vec(Bi) j (2.8) 

i i \ i / 

where K = ^ i d <£> Ri + 5] <8> I and R* is the cross covariance for term i. For example, if R has the 
form in (2.6) then we have R* oc W- sW* s T G M. NN * where N and N* are the number of samples in the 
training and test sets respectively and W and W* are their genotype. 

2.3 Genome-wide Association Studies 

In this section we briefly describe the methods implemented in LIMIX to perform single and multi-locus 
analysis in GWAS. 

2.3.1 Univariate GWAS 

The standard LMM considered in GWAS is 

y ~ M (Wa + x/9, a 2 g (R + 51)) (2.9) 
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2 Multivariate linear mixed models for statistical genetics 



where x is the genotypic profile of the SNP being tested, f3 its effect size, R the sample-to-sample related- 
ness matrix and 5 = o\jo^ the signal-to-noise ratio. Testing for association of the SNP to the phenotype 
is done by testing /3 7^ 0. The 5 representation of the marginal likelihood allows fast scalable methods and 
has been widely used in GWAS (2j [15] . 

The variance decomposition tool from LIMIX can be used to build more complex forms of the relatedness 
matrix R. For example, considering the model (2.5) introduced in the previous section, we can estimate 
the variance components {o - ,}- by maximum likelihood or penalized maximum likelihood and then define 
a new relatedness matrix R = J2i^i-^-i/Yli^i that accounts for all contributions. This is equivalent to 
fix the contributions of each random effect term on the null model while the signal-to-noise ratio is still 
flexibly evaluated SNP-wise using the model in (2.9). However, building a too complex background model 



might explain away genetic signal leading to power reduction in GWAS if sufficient care is not take [16J. 
In all our experiments we just considered the full kinship matrix as relatedness matrix, with the exception 
of the transcript-based eQTL analysis where we accounted for hidden confounders by using PANAMA 
(section 2.4). 



2.3.2 Multivariate GWAS 

The multivariate version of 12.91 is 



Y ~ M ((A cov 0 W) vec {B 0 ) + (A x <g> x) vec (B) , 0* (C ® R + 5S ® I)) (2.10) 

where the trait-trait matrices C and X! can be estimated using the variance decomposition tool and then 
plugged in. This equals to estimate the variance parameters on the model with no association with the 
marker. However, LIMIX allows for SNP-specific estimates of the global variance of random effects o 2 g 



and the signal-to-noise ratio 5. The model in (2.10) is used to test for specific trait designs of the marker 



on the multivariate phenotype. To do so, the alternative model in (2.10) is compared to a null model 
which has the same form but characterized by a different trait design Ao for the marker effect. In the 
following we describe the choices of Aq and A% to conduct the tests that are most commonly considered 
in multivariate GWAS and some of their extensions. 



• Any effect test: this is a P degrees of freedom test where we test if the marker has an effect on at 
least one of the phenotypes (A\ = Ip and Aq = Op). 

• Common effect test: this is a one degree of freedom test, where in the alternative model the marker 
has the same effect size and direction across all phenotypes (A\ = lp) while the null model does 
not contain the effect from the marker (Aq = Op). 

• Specific effect test: this is a one degree of freeform test that we can use to test if the genetic marker 
acts specifically on the phenotype p. The design in the alternative model is a combination of a 
common effect and an independent effect for trait p A\ = [lp, l trait==p ], where l trait==p return a 
vector with 1 at element p and 0 at all other elements, while in the null model we just have the 
common effect, Aq = lp. 

• Any specific effect test: this is a P — 1 degrees of freeform test where we test whether the marker 
has a specific effect on at least one of the phenotypes. The alternative model describes an any effect 
from the marker (A\ = Ip) while the null model describes a common effect (Aq = lp). 

LIMIX flexibly allows considering more complex tests to really exploit the information in multivariate 
datasets. Here we describe an example of a more complex test. Suppose we want to jointly model T 
traits across E different environments, for a total of P = TE trait-environment combinations. In this 
example, we might be interested in discovering loci that affect multiple phenotypes differently across 
different environment. This is a T degree of freedom test and can be performed in LIMIX by considering 
an alternative model where the marker has different effect sizes across traits and environments (A\ = Ip) 
and a null model where the marker has same effect size across different environments for each trait 
(Aq = I T ® 
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2.4 PANAMA 



LIMIX supports also more general models for multivariate GWAS that handles non-Kronecker structured 
fixed effects and incomplete designs. However, such a model does not allow speed-ups and can be only used 
for the joint analysis of a few traits. The model implemented for such a task is practically the analogous 



of (2.9): 



y ~ M (Wat + a 0 0 x(3 0 + a x 0 xp 1} a 2 (K + 51)) (2.11) 

where y is multi-phenotype vector, obtained by concatenating all observed entries of Y, and an element- 
wise product 0 allows introducing non-Kronecker-structured designs. While the first fixed effect for SNP 
x is contained both in the full and the null model, we test for design a\ by testing fi\ ^ 0. For example, 
if I labels a categorical variable with values in {h,h, h}, setting ao = lp and a\ = 1^==^ we can test for 
ZixSNP interactions. 

2.3.3 Multi-locus GWAS 

All the models we introduced in this section can be extended to consider multiple loci by step- wise forward 
selection |12j . This is the first time multivariate analysis and step- wise forward selection are combined 
together. The general framework LIMIX employs multi-locus GWAS by performing the following two 
steps: 



1. a genome- wide scan is performed using the model in (2.10) 

2. if the most associated marker has P- value lower than a certain threshold the marker is added as 
covariate and the algorithm return to step 1, otherwise it stops here. 

LIMIX allows the user to 

• set the type of test to be performed in the genome-wide scans; 

• set the design of the SNPs that are added as covariates; 

• set a threshold either over the P-value or the Q-value of the most associated SNP as stopping 
criterion; 

• add a maximum number of iterations as stopping criterion; 

• update the parameters of the trait-to-trait covariance matrices after each inclusion using internally 
the variance decomposition module. 

2.4 PANAMA 

The PANAMA tool from LIMIX can be used to find a sample-by-sample covariance matrix that accounts 
for genetic and non-genetic confounders |17j . This can be done when a high number of traits are thought 
to share common confounding structure. This is the case of eQTL analysis, where thousands of genes 
most likely share sample-specific confounders. Reproducing the framework used in gaussian process latent 
variable models [Hl[T9j, we assume independence of the traits (e.g., gene expression in eQTL analysis) 
conditions on the latent factors. After normalizing all traits to z-scores, we again consider a particular 



model from the general form (2.1): 



y p = u + rj + ip, Vp (2.12) 

u ~ M (0, a 2 g R g ) , rj ~ M (0, R c (a)) , rj, ~ M (0, a 2 e l) 

where R g is the genetic relatedness matrix introduced before while R c (a) is the relatedness matrix due to 
unobserved covariates. The parameters of the model are {a, a 2 , a 2 }. As full rank matrices might overfit 
the data explaining away genetic signal, LIMIX models R c (a) as rank r matrix and allows the user to 
choose r. A sensible value for r can be chosen by looking at the number of PCAs of the sample-by-sample 
empirical covariance matrix explaining 70%-90% of the variance; see also discussion in |17j . 

Once R c is learned from the data, a new sample-to-sample relatedness matrix can be plugged in the 
GWAS tools both for single and multivariate analysis. 
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